** The code in this do-file first imputes data for total government spending and government spending on health, and then uses this data to distribute public goods as equal lump-sums to all citizens.  

* loading wid data public goods spending
use "WID_public_goods_spending_2021_03_26.dta", clear

* creating country var
ren country cntry
kountry cntry, from(iso2c)
ren NAMES_STD country

* keeping only data on population size, national income, final consumption expenditure, and expenditures on health
keep if variable=="npopul992i" | variable=="mnninc999i" | variable=="mcongo999i" | variable=="mheago999i"

drop age percentile pop 

* reformatting
reshape wide value, i(year cntry) j(variable) string

foreach v of varlist value* {
   	local newname = substr("`v'", 6, .)
   	rename `v' `newname'
}

* Imputing missing data on public goods spending:

* first, we linearly interpolate missing values on expenditures
sort cntry year
bysort cntry: ipolate mcongo999i  year, g(imcongo999i) 
bysort cntry: ipolate mheago999i  year, g(imheago999i) 

drop if year<1980

* saving data 
save "WID_final_and_health_exp.dta", replace

*** Next, we extrapolate the series based on the trends in neighbouring countries 

*Imputation of spending: preparing for imputation
drop country

sort cntry  year
encode cntry, g(c1)

 g h=.
 forval n=1/17 {
 su imheago999i if c1==`n'  & year==2000
 replace h=imheago999i/r(mean) if c1==`n'
 }
 
 g all=.
 forval n=1/17 {
 su imcongo999i if c1==`n'  & year==2000
 replace all=imcongo999i/r(mean) if c1==`n'
 }

  keep cntry year all h
 
 
 reshape wide all h, i(year) j(cntry) string
 
 * DK and NO and SE
 * Health - average of SE and FI
 g hSEFI=(hSE+hFI)/2
 
 *Health: DK
  foreach n of numlist 1983 1982 1981 1980 {
 replace hDK=hDK[_n+1]*(1-((hSEFI[_n+1]-hSEFI)/hSEFI)) if year==`n'
 }
 
 *Health: NO
   foreach n of numlist 1989 1988 1987 1986 1985 1984 1983 1982 1981 1980 {
 replace hNO=hNO[_n+1]*(1-((hSEFI[_n+1]-hSEFI)/hSEFI)) if year==`n'
 }
 
 **Check of coding: 
 *twoway con hDK hSE hNO  hFI hSEFI year // works
 
 * overall spending: SE
 g allDNF=(allDK+allFI+allNO)/3 // average of DK, FI, and NO
 
 foreach n of numlist 1992 1991 1990 1989 1988 1987 1986 1985 1984 1983 1982 1981 1980 {
 replace allSE=allSE[_n+1]*(1-((allDNF[_n+1]-allDNF)/allDNF)) if year==`n'
 }
 
 **Check of coding: 
 *twoway con allDK allSE allNO allFI  year // works
 
  * Overall spending in Ireland; based on trend in GB
 foreach n of numlist 1984 1983 1982 1981 1980 {
 replace allIE=allIE[_n+1]*(1-((allGB[_n+1]-allGB)/allGB)) if year==`n'
 }
 
 foreach n of numlist 1989 1988 1987 1986 1985 1984 1983 1982 1981 1980 {
 replace hIE=hIE[_n+1]*(1-((hGB[_n+1]-hGB)/hGB)) if year==`n'
 }
 
 **Check of coding: 
 *twoway con allIE allGB  year // works
 *twoway con hIE hGB    year // works
 
  
 * Health France, based on overall spending in France
 foreach n of numlist 1982 1981 1980 {
 replace hFR=hFR[_n+1]*(1-((allFR[_n+1]-allFR)/allFR)) if year==`n'
 }
 **Check of coding: 
  *twoway con hFR allFR   year // works
 
 * Health Netherlands, based on overall spending in Netherlands
  foreach n of numlist 1994 1993 1992 1991 1990 1989 1988 1987 1986 1985 1984 1983 1982 1981 1980 {
 replace hNL=hNL[_n+1]*(1-((allNL[_n+1]-allNL)/allNL)) if year==`n'
 }
 
 **Check of coding: 
 *twoway con hNL allNL   year // works
 
 * Overall spending Switzerland, based on trends in DE, AT, BE, and FR
 g allcont=(allDE+allAT+allBE+allFR)/4
  
  foreach n of numlist  1989 1988 1987 1986 1985 1984 1983 1982 1981 1980 {
 replace allCH=allCH[_n+1]*(1-((allcont[_n+1]-allcont)/allcont)) if year==`n'
 }
 
 **Check of coding: 
  *twoway con allCH allcont  year // works
 
 **health in Switzerland, based on trends in DE and AT
 g hDEAT=(hDE+hAT)/2
 foreach n of numlist 1994 1993 1992 1991 1990 1989 1988 1987 1986 1985 1984 1983 1982 1981 1980 {
 replace hCH=hCH[_n+1]*(1-((hDEAT[_n+1]-hDEAT)/hDEAT)) if year==`n'
 }
 
 **Check of coding: 
 *twoway con hCH hDE hAT hBE hFR allCH hDEAT   year // works
 
 ** Spain
 * health, based on trends in IT and PT
 
 
 g hITPT=(hIT+hPT)/2
  foreach n of numlist 1984 1983 1982 1981 1980 {
 replace hES=hES[_n+1]*(1-((hITPT[_n+1]-hITPT)/hITPT)) if year==`n'
 }
 
 **Check of coding: 
 *twoway con hES hIT hPT hFR hITPT   year // works
 
 * Spain overall spending, based on trends in IT, PT, and FR
 
 g allITPTFR=(allIT+allPT+allFR)/3
  
 foreach n of numlist 1994 1993 1992 1991 1990 1989 1988 1987 1986 1985 1984 1983 1982 1981 1980 {
 replace allES=allES[_n+1]*(1-((allITPTFR[_n+1]-allITPTFR)/allITPTFR)) if year==`n'
 }
 
 **Check of coding: 
 *twoway con allES hES allIT allPT allFR   year // works
 
 
 * reshaping to long
 reshape long all h, i(year) string
 
 ren _j cntry
 sort cntry year
 drop if cntry=="DEAT" | cntry=="DNF" | cntry=="ITPT" | cntry=="ITPTFR" | cntry=="SEFI" | cntry=="cont"

 
kountry cntry, from(iso2c)
ren NAMES_STD country

* merging with WID spending data that we saved above
merge 1:1 country year using "WID_final_and_health_exp.dta"
drop _merge
drop country 

* reformatting
levelsof cntry, local(levels)
reshape wide all h imcongo999i imheago999i mcongo999i mheago999i mnninc999i npopul992i, i(year) j(cntry) string
 
 ** imputing for overall spending 
foreach p of local levels {
 replace imcongo999i`p'=imcongo999i`p'[21]*all`p' if imcongo999i`p'==.
 }

 ** imputing for health spending
 foreach p of local levels {
 replace imheago999i`p'=imheago999i`p'[21]*h`p' if imheago999i`p'==.
 }
 
  * reshaping to long
 reshape long all h imcongo999i imheago999i mcongo999i mheago999i mnninc999i npopul992i, i(year) string
 
 ren _j cntry
 sort cntry year

* generating country var 
kountry cntry, from(iso2c)
ren NAMES_STD country

* keep only spending vars
keep country year imcongo999i imheago999i
ren imcongo999i itotal_benefits
ren imheago999i itotal_health

save "WID_imputed_public_goods_spending.dta", replace
 
*** We now calculate income shares when all public goods are distributed as an equal lump sum

* loading wid data public goods spending
use "WID_public_goods_spending_2021_03_26.dta", clear

* generating country var
kountry country, from(iso2c)
ren country cntry
ren NAMES_STD country

* keep only the vars we're going to use
keep if variable=="npopul992i" | variable=="adiinc992j" | variable=="acainc992j" | variable=="aptinc992j" | variable=="mnninc999i" | variable=="mcongo999i" | variable=="mheago999i"

replace variable="apre_" if variable=="aptinc992j"
replace variable="apost_" if variable=="adiinc992j"
replace variable="apdisp_" if variable=="acainc992j"

gen var=variable+percentile

drop if year<1980
drop age cntry percentile pop variable

* reformatting
reshape wide value, i(year country) j(var) string

foreach v of varlist value* {
   	local newname = substr("`v'", 6, .)
   	rename `v' `newname'
}

ren npopul992ip0p100 n_adults // number of adults
ren mnninc999ip0p100 total_ni // total national income
ren mheago999ip0p100 total_health // total health related expenditures
ren mcongo999ip0p100 total_benefits // total public goods expenditures

* number of adults in different income groups
g n_1p=0.01*n_adults
g n_10p=0.1*n_adults
g n_20p=0.2*n_adults
g n_30p=0.3*n_adults
g n_40p=0.4*n_adults

** macroeconomic total by group: average income * number of adults in group
*p0p10
g mpre_p0p10=apre_p0p10*n_10p
g mpdisp_p0p10=apdisp_p0p10*n_10p
g mpost_p0p10=apost_p0p10*n_10p

*p10p20
g mpre_p10p20=apre_p10p20*n_10p
g mpdisp_p10p20=apdisp_p10p20*n_10p
g mpost_p10p20=apost_p10p20*n_10p

*p20p30
g mpre_p20p30=apre_p20p30*n_10p
g mpdisp_p20p30=apdisp_p20p30*n_10p
g mpost_p20p30=apost_p20p30*n_10p

*p30p40
g mpre_p30p40=apre_p30p40*n_10p
g mpdisp_p30p40=apdisp_p30p40*n_10p
g mpost_p30p40=apost_p30p40*n_10p

*p40p50
g mpre_p40p50=apre_p40p50*n_10p
g mpdisp_p40p50=apdisp_p40p50*n_10p
g mpost_p40p50=apost_p40p50*n_10p

*p50p60
g mpre_p50p60=apre_p50p60*n_10p
g mpdisp_p50p60=apdisp_p50p60*n_10p
g mpost_p50p60=apost_p50p60*n_10p

*p60p70
g mpre_p60p70=apre_p60p70*n_10p
g mpdisp_p60p70=apdisp_p60p70*n_10p
g mpost_p60p70=apost_p60p70*n_10p

*p70p80
g mpre_p70p80=apre_p70p80*n_10p
g mpdisp_p70p80=apdisp_p70p80*n_10p
g mpost_p70p80=apost_p70p80*n_10p

*p80p90
g mpre_p80p90=apre_p80p90*n_10p
g mpdisp_p80p90=apdisp_p80p90*n_10p
g mpost_p80p90=apost_p80p90*n_10p

*p90p100
g mpre_p90p100=apre_p90p100*n_10p
g mpdisp_p90p100=apdisp_p90p100*n_10p
g mpost_p90p100=apost_p90p100*n_10p

*p99p100
g mpre_p99p100=apre_p99p100*n_1p
g mpdisp_p99p100=apdisp_p99p100*n_1p
g mpost_p99p100=apost_p99p100*n_1p

* b20
g mpre_p0p20=mpre_p0p10+mpre_p10p20 
g mpdisp_p0p20=mpdisp_p0p10+mpdisp_p10p20 
g mpost_p0p20=mpost_p0p10+mpost_p10p20 

* b30
g mpre_p0p30=mpre_p0p10+mpre_p10p20+mpre_p20p30 
g mpdisp_p0p30=mpdisp_p0p10+mpdisp_p10p20+mpdisp_p20p30 
g mpost_p0p30=mpost_p0p10+mpost_p10p20+mpost_p20p30 

* m20
g mpre_p40p60=mpre_p40p50+mpre_p50p60 
g mpdisp_p40p60=mpdisp_p40p50+mpdisp_p50p60 
g mpost_p40p60=mpost_p40p50+mpost_p50p60 

* m40
g mpre_p30p70=mpre_p30p40+mpre_p40p50+mpre_p50p60+mpre_p60p70 
g mpdisp_p30p70=mpdisp_p30p40+mpdisp_p40p50+mpdisp_p50p60+mpdisp_p60p70 
g mpost_p30p70=mpost_p30p40+mpost_p40p50+mpost_p50p60+mpost_p60p70

* t20
g mpre_p80p100=mpre_p80p90+mpre_p90p100 
g mpdisp_p80p100=mpdisp_p80p90+mpdisp_p90p100 
g mpost_p80p100=mpost_p80p90+mpost_p90p100 

* t30
g mpre_p70p100=mpre_p70p80+mpre_p80p90+mpre_p90p100 
g mpdisp_p70p100=mpdisp_p70p80+mpdisp_p80p90+mpdisp_p90p100 
g mpost_p70p100=mpost_p70p80+mpost_p80p90+mpost_p90p100  


* income share:
foreach v of varlist mpre_p0p20 mpdisp_p0p20 mpost_p0p20 mpre_p0p30 mpdisp_p0p30 mpost_p0p30  mpre_p40p60 mpdisp_p40p60 mpost_p40p60 mpre_p30p70 mpdisp_p30p70 mpost_p30p70  mpre_p70p100 mpdisp_p70p100 mpost_p70p100  mpre_p80p100 mpdisp_p80p100 mpost_p80p100  mpre_p90p100 mpdisp_p90p100 mpost_p90p100 mpre_p99p100 mpdisp_p99p100 mpost_p99p100 {
	
	g share_`v'=round(`v'/total_ni, 0.0001)
}

merge 1:1 country year  using "WID_wide"
drop _merge

* keep only vars we're going to use below
keep country year share_* pre_* pdisp_* post_* n_* total_* mpre_* mpdisp_* mpost_* apre_* apdisp_* apost_*

* shares match with those downloaded directly from the WID database, only a few rounding discrepancies
su pre_p0p20 share_mpre_p0p20 if share_mpre_p0p20!=. & pre_p0p20!=.
su post_p0p20 share_mpost_p0p20 if share_mpre_p0p20!=. & pre_p0p20!=.

su pre_p0p30 share_mpre_p0p30 if share_mpre_p0p30!=. & pre_p0p30!=.
su post_p0p30 share_mpost_p0p30 if share_mpre_p0p30!=. & pre_p0p30!=.

su pre_p40p60 share_mpre_p40p60 if share_mpre_p0p20!=. & pre_p0p20!=.
su post_p40p60 share_mpost_p40p60 if share_mpre_p0p20!=. & pre_p0p20!=.

su pre_p30p70 share_mpre_p30p70 if share_mpre_p0p20!=. & pre_p0p20!=.
su post_p30p70 share_mpost_p30p70 if share_mpre_p0p20!=. & pre_p0p20!=.

su pre_p70p100 share_mpre_p70p100 if share_mpre_p0p20!=. & pre_p0p20!=.
su post_p70p100 share_mpost_p70p100 if share_mpre_p0p20!=. & pre_p0p20!=.

su pre_p90p100 share_mpre_p90p100 if share_mpre_p0p20!=. & pre_p0p20!=.
su post_p90p100 share_mpost_p90p100 if share_mpre_p0p20!=. & pre_p0p20!=.

su pre_p80p100 share_mpre_p80p100 if share_mpre_p0p20!=. & pre_p0p20!=.
su post_p80p100 share_mpost_p80p100 if share_mpre_p0p20!=. & pre_p0p20!=.

su pre_p99p100 share_mpre_p99p100 if share_mpre_p0p20!=. & pre_p0p20!=.
su post_p99p100 share_mpost_p99p100 if share_mpre_p0p20!=. & pre_p0p20!=.

**** merging with the imputed public goods spending series ****
merge 1:1 country year using "WID_imputed_public_goods_spending.dta"
drop _merge
drop if year<1980

* extend the public goods spending series for the US to 2019 to match the income data series
sort country year 
by country: ipolate itotal_benefits year, g(ietotal_benefits) e
by country: ipolate itotal_health year, g(ietotal_health) e

replace itotal_benefits=ietotal_benefits if country=="United States" & (year==2018 | year==2019)
replace itotal_health=ietotal_health if country=="United States" &  year==2019
drop ietotal_benefits ietotal_health

keep if post_p0p10!=. & pre_p0p10!=. // removing obs not in the final dataset

**** 
* creating var that captures public goods spending minus spending on health
g benefits_ex_health=itotal_benefits-itotal_health

** now we calculate income shares when all public goods spending is distributed as an equal lump sum
* we do so by subtracting the public goods spending (ex. health) allocated to a group by the WID (public goods spending ex health is distributed proportionally to the post-tax disposable income of the group) and then add back the spending as an equal lump sum.  

**** bottom 30% ****
* post-tax income of b30 with all in-kind benefits distributed as an equal lump-sum
g mlspost_p0p30=mpost_p0p30-(benefits_ex_health*pdisp_p0p30)+(benefits_ex_health*0.3)

* b30's new income share
g share_lspost_p0p30=round(mlspost_p0p30/total_ni, 0.0001)

* b30's new average income
g alspost_p0p30=round(mlspost_p0p30/n_30p, 0.0001)

** check the results:
su share_lspost_p0p30  share_mpost_p0p30 post_p0p30

* average income
g apre_p0p30=(apre_p0p10+apre_p10p20+apre_p20p30)/3
g apdisp_p0p30=(apdisp_p0p10+apdisp_p10p20+apdisp_p20p30)/3
g apost_p0p30=(apost_p0p10+apost_p10p20+apost_p20p30)/3
su alspost_p0p30 apost_p0p30

**** bottom 20% ****
* post-tax income of b20 with all in-kind benefits distributed as an equal lump-sum
g mlspost_p0p20=mpost_p0p20-(benefits_ex_health*pdisp_p0p20)+(benefits_ex_health*0.2)

* b20's new income share
g share_lspost_p0p20=round(mlspost_p0p20/total_ni, 0.0001)

* b20's new average income
g alspost_p0p20=round(mlspost_p0p20/n_20p, 0.0001)

** check the results:
su share_lspost_p0p20  share_mpost_p0p20 post_p0p20

* average income
g apre_p0p20=(apre_p0p10+apre_p10p20)/2
g apdisp_p0p20=(apdisp_p0p10+apdisp_p10p20)/2
g apost_p0p20=(apost_p0p10+apost_p10p20)/2
su alspost_p0p20 apost_p0p20

**** Middle 20% ****
* post-tax income of m20 with all in-kind benefits distributed as an equal lump-sum
g mlspost_p40p60=mpost_p40p60-(benefits_ex_health*pdisp_p40p60)+(benefits_ex_health*0.2)

* m20's new income share
g share_lspost_p40p60=round(mlspost_p40p60/total_ni, 0.0001)

* m20's new average income
g alspost_p40p60=round(mlspost_p40p60/n_20p, 0.0001)

** check the results:
su share_lspost_p40p60  share_mpost_p40p60 post_p40p60

* average income
g apre_p40p60=(apre_p40p50+apre_p50p60)/2
g apdisp_p40p60=(apdisp_p40p50+apdisp_p50p60)/2
g apost_p40p60=(apost_p40p50+apost_p50p60)/2
su alspost_p40p60 apost_p40p60

**** Middle 40% ****
* post-tax income of m40 with all in-kind benefits distributed as an equal lump-sum
g mlspost_p30p70=mpost_p30p70-(benefits_ex_health*pdisp_p30p70)+(benefits_ex_health*0.4)

* m40's new income share
g share_lspost_p30p70=round(mlspost_p30p70/total_ni, 0.0001)

* m40's new average income
g alspost_p30p70=round(mlspost_p30p70/n_40p, 0.0001)

** check the results:
su share_lspost_p30p70  share_mpost_p30p70 post_p30p70

* average income
g apre_p30p70=(apre_p30p40+apre_p40p50+apre_p50p60+apre_p60p70)/4
g apdisp_p30p70=(apdisp_p30p40+apdisp_p40p50+apdisp_p50p60+apdisp_p60p70)/4
g apost_p30p70=(apost_p30p40+apost_p40p50+apost_p50p60+apost_p60p70)/4
su alspost_p30p70 apost_p30p70


**** Top 1% ****
* post-tax income of t1 with all in-kind benefits distributed as an equal lump-sum
g mlspost_p99p100=mpost_p99p100-(benefits_ex_health*pdisp_p99p100)+(benefits_ex_health*0.01)

* t1's new income share
g share_lspost_p99p100=round(mlspost_p99p100/total_ni, 0.0001)

* t1's new average income
g alspost_p99p100=round(mlspost_p99p100/n_1p, 0.0001)

** check the results:
su share_lspost_p99p100  share_mpost_p99p100 post_p99p100

su alspost_p99p100 apost_p99p100

**** Top 10% ****
* post-tax income of t10 with all in-kind benefits distributed as an equal lump-sum
g mlspost_p90p100=mpost_p90p100-(benefits_ex_health*pdisp_p90p100)+(benefits_ex_health*0.1)

* t10's new income share
g share_lspost_p90p100=round(mlspost_p90p100/total_ni, 0.0001)

* t10's new average income
g alspost_p90p100=round(mlspost_p90p100/n_10p, 0.0001)

** check the results:
su share_lspost_p90p100  share_mpost_p90p100 post_p90p100

su alspost_p90p100 apost_p90p100


**** Top 20% ****
* post-tax income of t20 with all in-kind benefits distributed as an equal lump-sum
g mlspost_p80p100=mpost_p80p100-(benefits_ex_health*pdisp_p80p100)+(benefits_ex_health*0.2)

* t20's new income share
g share_lspost_p80p100=round(mlspost_p80p100/total_ni, 0.0001)

* t20's new average income
g alspost_p80p100=round(mlspost_p80p100/n_20p, 0.0001)

** check the results:
su share_lspost_p80p100  share_mpost_p80p100 post_p80p100

su alspost_p80p100

**** Top 30% ****
* post-tax income of t30 with all in-kind benefits distributed as an equal lump-sum
g mlspost_p70p100=mpost_p70p100-(benefits_ex_health*pdisp_p70p100)+(benefits_ex_health*0.3)

* t30's new income share
g share_lspost_p70p100=round(mlspost_p70p100/total_ni, 0.0001)

* t30's new average income
g alspost_p70p100=round(mlspost_p70p100/n_30p, 0.0001)

** check the results:
su share_lspost_p70p100  share_mpost_p70p100 post_p70p100

su alspost_p70p100

save "WID_income_data.dta", replace

 
